multi-relational data
1cecc7a77928ca8133fa24680a88d2f9-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors propose a simple and scalable approach to modeling multi-relational data using low-dimensional vector embeddings of entities, with the relationships between embeddings captured using offset vectors. The embeddings are learned by training a margin-based ranking model to score the observed entity1,relationship,entity2 triples higher than the unobserved ones. Though the proposed model can be seen as a special case of several existing models (e.g. The approach is well motivated and clearly described. The empirical evaluation is reasonably well done, but the write up could be better.
A latent factor model for highly multi-relational data
Many data such as social networks, movie preferences or knowledge bases are multi-relational, in that they describe multiple relations between entities. While there is a large body of work focused on modeling these data, modeling these multiple types of relations jointly remains challenging. Further, existing approaches tend to breakdown when the number of these types grows. In this paper, we propose a method for modeling large multi-relational datasets, with possibly thousands of relations. Our model is based on a bilinear structure, which captures various orders of interaction of the data, and also shares sparse latent factors across different relations. We illustrate the performance of our approach on standard tensor-factorization datasets where we attain, or outperform, state-of-the-art results. Finally, a NLP application demonstrates our scalability and the ability of our model to learn efficient and semantically meaningful verb representations.
- Europe > France > Île-de-France > Paris > Paris (0.04)
- South America > Brazil (0.04)
- Oceania > Australia (0.04)
- (4 more...)
- Information Technology > Communications (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Translating Embeddings for Modeling Multi-relational Data
We consider the problem of embedding entities and relationships of multirelational data in low-dimensional vector spaces. Our objective is to propose a canonical model which is easy to train, contains a reduced number of parameters and can scale up to very large databases. Hence, we propose TransE, a method which models relationships by interpreting them as translations operating on the low-dimensional embeddings of the entities. Despite its simplicity, this assumption proves to be powerful since extensive experiments show that TransE significantly outperforms state-of-the-art methods in link prediction on two knowledge bases. Besides, it can be successfully trained on a large scale data set with 1M entities, 25k relationships and more than 17M training samples.
- North America > United States > Pennsylvania > Bucks County (0.14)
- North America > United States > New Jersey > Ocean County (0.14)
- North America > United States > New Jersey > Atlantic County (0.14)
- (10 more...)
- Media > Film (1.00)
- Leisure & Entertainment > Sports (0.68)
A latent factor model for highly multi-relational data
Many data such as social networks, movie preferences or knowledge bases are multi-relational, in that they describe multiple relationships between entities. While there is a large body of work focused on modeling these data, few considered modeling these multiple types of relationships jointly. Further, existing approaches tend to breakdown when the number of these types grows. In this paper, we propose a method for modeling large multi-relational datasets, with possibly thousands of relations. Our model is based on a bilinear structure, which captures the various orders of interaction of the data, but also shares sparse latent factors across different relations. We illustrate the performance of our approach on standard tensor-factorization datasets where we attain, or outperform, state-of-the-art results.
Disentangle-based Continual Graph Representation Learning
Kou, Xiaoyu, Lin, Yankai, Liu, Shaobo, Li, Peng, Zhou, Jie, Zhang, Yan
Graph embedding (GE) methods embed nodes (and/or edges) in graph into a low-dimensional semantic space, and have shown its effectiveness in modeling multi-relational data. However, existing GE models are not practical in real-world applications since it overlooked the streaming nature of incoming data. To address this issue, we study the problem of continual graph representation learning which aims to continually train a GE model on new data to learn incessantly emerging multi-relational data while avoiding catastrophically forgetting old learned knowledge. Moreover, we propose a disentangle-based continual graph representation learning (DiCGRL) framework inspired by the human's ability to learn procedural knowledge. The experimental results show that DiCGRL could effectively alleviate the catastrophic forgetting problem and outperform state-of-the-art continual learning models. The code and datasets are released on https://github.com/KXY-PUBLIC/DiCGRL.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (4 more...)
- Education (0.46)
- Leisure & Entertainment (0.46)
A latent factor model for highly multi-relational data
Jenatton, Rodolphe, Roux, Nicolas L., Bordes, Antoine, Obozinski, Guillaume R.
Many data such as social networks, movie preferences or knowledge bases are multi-relational, in that they describe multiple relationships between entities. While there is a large body of work focused on modeling these data, few considered modeling these multiple types of relationships jointly. Further, existing approaches tend to breakdown when the number of these types grows. In this paper, we propose a method for modeling large multi-relational datasets, with possibly thousands of relations. Our model is based on a bilinear structure, which captures the various orders of interaction of the data, but also shares sparse latent factors across different relations.
Clustering as an Evaluation Protocol for Knowledge Embedding Representation of Categorised Multi-relational Data in the Clinical Domain
Learning knowledge representation is an increasingly important technology applicable in many domain-specific machine learning problems. We discuss the effectiveness of traditional Link Prediction or Knowledge Graph Completion evaluation protocol when embedding knowledge representation for categorised multi-relational data in the clinical domain. Link prediction uses to split the data into training and evaluation subsets, leading to loss of information along training and harming the knowledge representation model accuracy. We propose a Clustering Evaluation Protocol as a replacement alternative to the traditionally used evaluation tasks. We used embedding models trained by a knowledge embedding approach which has been evaluated with clinical datasets. Experimental results with Pearson and Spearman correlations show strong evidence that the novel proposed evaluation protocol is pottentially able to replace link prediction.
- Europe > Switzerland > Zürich > Zürich (0.14)
- South America > Brazil (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Alameda County > Oakland (0.04)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Fast Linear Model for Knowledge Graph Embeddings
Joulin, Armand, Grave, Edouard, Bojanowski, Piotr, Nickel, Maximilian, Mikolov, Tomas
This paper shows that a simple baseline based on a Bag-of-Words (BoW) representation learns surprisingly good knowledge graph embeddings. By casting knowledge base completion and question answering as supervised classification problems, we observe that modeling co-occurences of entities and relations leads to state-of-the-art performance with a training time of a few minutes using the open sourced library fastText.
Propositionalization for Unsupervised Outlier Detection in Multi-Relational Data
Riahi, Fatemeh (Simon Fraser University) | Schulte, Oliver (Simon Fraser University)
We develop a novel propositionalization approach to unsupervised outlier detection for multi-relational data. Propositionalization summarizes the information from multi-relational data, that are typically stored in multiple tables, in a single data table. The columns in the data table represent conjunctive relational features that are learned from the data. An advantage of propositionalization is that it facilitates applying the many previous outlier detection methods that were designed for single-table data. We show that conjunctive features for outlier detection can be learned from data using statistical-relational methods. Specifically, we apply Markov Logic Network structure learning. Compared to baseline propositionalization methods, Markov Logic propositionalization produces the most compact data tables, whose attributes capture the most complex multi-relational correlations. We apply three representative outlier detection methods LOF, KNN, OutRank to the data tables constructed by propositionalization.
Combining Two and Three-Way Embedding Models for Link Prediction in Knowledge Bases
Garcia-Duran, Alberto, Bordes, Antoine, Usunier, Nicolas, Grandvalet, Yves
This paper tackles the problem of endogenous link prediction for knowledge base completion. Knowledge bases can be represented as directed graphs whose nodes correspond to entities and edges to relationships. Previous attempts either consist of powerful systems with high capacity to model complex connectivity patterns, which unfortunately usually end up overfitting on rare relationships, or in approaches that trade capacity for simplicity in order to fairly model all relationships, frequent or not. In this paper, we propose Tatec, a happy medium obtained by complementing a high-capacity model with a simpler one, both pre-trained separately and then combined. We present several variants of this model with different kinds of regularization and combination strategies and show that this approach outperforms existing methods on different types of relationships by achieving state-of-the-art results on four benchmarks of the literature.
- North America > Canada (0.14)
- Europe > United Kingdom (0.04)
- Europe > Poland (0.04)
- (12 more...)
- Leisure & Entertainment (0.68)
- Media (0.46)
- Law (0.46)